In this tutorial we will introduce users to some basic code for making plotting and visualizing data
In this short tutorial, you will learn
set your working directory, just like before:
We will plot some data using a dataset available in base R (it comes with R). These data were collected by Edgar Anderson in 1935 and used by R.A. Fisher (and many, many people since). The iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. Here’s what they look like:
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
# here's a basic plot command: plot(x, y)
# this says using the dataframe 'iris', plot 'petal width' on the x axis, and 'petal length' on the y axis
with(iris,
plot(Petal.Width,
Petal.Length))
# an alternative way to write this is to specify the dataframe iris, follwed by a dollar sign $, then the column name
plot(iris$Petal.Width,
iris$Petal.Length)
# I use this notation to help me keep track of multiple columns within different data frames
# another way to make the same plot is to use a tilde ~
# In R the tilde is used to separate left and right sides of a model formula
# so you could read this as: plot Petal Length as a function of Petal Width
plot(iris$Petal.Length ~ iris$Petal.Width)
# make this look nicer by adding axis labels for the x axis (xlab) and y axis (ylab), plus a title (main)
plot(iris$Petal.Length ~ iris$Petal.Width,
xlab = "Petal Width (cm)", ylab = "Petal Length (cm)",
main = "Data from three iris species"
)
# there is some clear separation in the points. To see if those separate by species, make each species a separate color with col= (this only works if the variable you are using to definte the colors is categorical)
plot(iris$Petal.Length ~ iris$Petal.Width,
xlab = "Petal Width (cm)", ylab = "Petal Length (cm)",
main = "Data from three iris species", col = iris$Species
)
# the same formula can be used to make a boxplot
boxplot(iris$Petal.Length ~ iris$Species)
# then color in the boxes using col= and specifying the colors as a list
boxplot(iris$Petal.Length ~ iris$Species,
col = c("black", "red", "green")
)
# if we have not yet mentioned it, you specify a list or vector of multiple items in R using the c() function, which stands for 'concatenate'
Open a new R script or just add onto the chunk of code above
Play around with adding axis labels, title, and changing colors in the boxplot
## let's go back to the iris data scatterplot
# use pch= to change the plotting symbol (stands for plot character)
plot(iris$Petal.Length ~ iris$Petal.Width,
xlab = "Iris Petal Width", ylab = "Iris Petal Length",
main = "Data from three iris species", pch = 2
) # all points will be a triangle
plot(iris$Petal.Length ~ iris$Petal.Width,
xlab = "Iris Petal Width", ylab = "Iris Petal Length",
main = "Data from three iris species",
pch = c(1, 2, 18)[unclass(iris$Species)]
)
# This works by using c(1, 2, 18) to create a vector,
# unclass(iris$Species) turns the list of species from a list of categories
# (a "factor" data type in R terminology) into a list of numbers, each representing a species:
c(1, 2, 18)[unclass(iris$Species)]
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [24] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [47] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [70] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [93] 2 2 2 2 2 2 2 2 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
## [116] 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
## [139] 18 18 18 18 18 18 18 18 18 18 18 18
# do the same thing with assigning different colors to the 3 species
plot(iris$Petal.Length ~ iris$Petal.Width,
xlab = "Iris Petal Width", ylab = "Iris Petal Length",
main = "Data from three iris species",
col = c("magenta", "dark green", "blue")[unclass(iris$Species)]
)
# add a legend: specify where it will be located, assign a unique title for each of the species, and specify the colors and plot characters (or symbol)
legend("topleft",
legend = unique(iris$Species),
col = c("magenta", "dark green", "blue"), pch = 1
)
# alternatively, you can use x,y coordinates to place the legend
legend(1.8, 4,
legend = unique(iris$Species),
col = c("magenta", "dark green", "blue"), pch = 1, bty = "n"
)
# saving plots:
# in Rstudio, you can save a plot very quickly by opening the 'Export' drop-down menu in the figure window, and selecting 'Copy to Clipboard'. I use this to quickly past figures into a Word or powerpoint document as a way of taking notes while I work. that way I can place 2 similar graphs side by side and look closely at them, instead of clicking back and forth.
# to make a printable, high quality figure, click on 'Save as PDF' or 'Save as Image' and then specify your size, orientation, and file you save to
here’s link to more plotting options: https://www.statmethods.net/advgraphs/parameters.html
make plot of the iris data with 3 colors and 3 shapes of your choosing (one for each species, and a matching legend